{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "3f6c2bfe6b2e26c92357e896a1511195d836956e"
   },
   "source": [
    "<center>\n",
    "<img src=\"../../img/ods_stickers.jpg\">\n",
    "    \n",
    "## [mlcourse.ai](https://mlcourse.ai) - Open Machine Learning Course\n",
    "\n",
    "Author: [Yury Kashnitsky](https://www.linkedin.com/in/festline/). All content is distributed under the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "cb01ca96934e5c83a36a2308da9645b87a9c52a0"
   },
   "source": [
    "## <center> Assignment 4 (demo). Solution\n",
    "### <center>  Sarcasm detection with logistic regression\n",
    "    \n",
    "**Same assignment as a [Kaggle Kernel](https://www.kaggle.com/kashnitsky/a4-demo-sarcasm-detection-with-logit) + [solution](https://www.kaggle.com/kashnitsky/a4-demo-sarcasm-detection-with-logit-solution).**\n",
    "\n",
    "\n",
    "We'll be using the dataset from the [paper](https://arxiv.org/abs/1704.05579) \"A Large Self-Annotated Corpus for Sarcasm\" with >1mln comments from Reddit, labeled as either sarcastic or not. A processed version can be found on Kaggle in a form of a [Kaggle Dataset](https://www.kaggle.com/danofer/sarcasm)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "ed87ab2845921166bb73ca854bfe1ef013c035e9"
   },
   "outputs": [],
   "source": [
    "PATH_TO_DATA = \"../input/sarcasm/train-balanced-sarcasm.csv\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "ffa03aec57ab6150f9bec0fa56cd3a5791a3e6f4"
   },
   "outputs": [],
   "source": [
    "# some necessary imports\n",
    "import os\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "from matplotlib import pyplot as plt\n",
    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.metrics import accuracy_score, confusion_matrix\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.pipeline import Pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "b23e4fc7a1973d60e0c6da8bd60f3d921542a856"
   },
   "outputs": [],
   "source": [
    "train_df = pd.read_csv(PATH_TO_DATA)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "4dc7b3787afa46c7eb0d0e33b0c41ab9821c4a27"
   },
   "outputs": [],
   "source": [
    "train_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "0a7ed9557943806c6813ad59c3d5ebdb403ffd78"
   },
   "outputs": [],
   "source": [
    "train_df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "6472f52fb5ecb8bb2a6e3b292678a2042fcfe34c"
   },
   "source": [
    "Some comments are missing, so we drop the corresponding rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "97b2d85627fcde52a506dbdd55d4d6e4c87d3f08"
   },
   "outputs": [],
   "source": [
    "train_df.dropna(subset=[\"comment\"], inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "9d51637ee70dca7693737ad0da1dbb8c6ce9230b"
   },
   "source": [
    "We notice that the dataset is indeed balanced"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "addd77c640423d30fd146c8d3a012d3c14481e11"
   },
   "outputs": [],
   "source": [
    "train_df[\"label\"].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "5b836574e5093c5eb2e9063fefe1c8d198dcba79"
   },
   "source": [
    "We split data into training and validation parts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "c200add4e1dcbaa75164bbcc73b9c12ecb863c96"
   },
   "outputs": [],
   "source": [
    "train_texts, valid_texts, y_train, y_valid = train_test_split(\n",
    "    train_df[\"comment\"], train_df[\"label\"], random_state=17\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "8ccf65310c5e61194dd6a8913276e54cc32e7712"
   },
   "source": [
    "## Tasks:\n",
    "1. Analyze the dataset, make some plots. This [Kernel](https://www.kaggle.com/sudalairajkumar/simple-exploration-notebook-qiqc) might serve as an example\n",
    "2. Build a Tf-Idf + logistic regression pipeline to predict sarcasm (`label`) based on the text of a comment on Reddit (`comment`).\n",
    "3. Plot the words/bigrams which a most predictive of sarcasm (you can use [eli5](https://github.com/TeamHG-Memex/eli5) for that)\n",
    "4. (optionally) add subreddits as new features to improve model performance. Apply here the Bag of Words approach, i.e. treat each subreddit as a new feature."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "ba1a8f65032c5954476a68e01b607655145b746d"
   },
   "source": [
    "### Part 1. Exploratory data analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "6a045e347fabd462a643639a1334b51f8780627d"
   },
   "source": [
    "Distribution of lengths for sarcastic and normal comments is almost the same."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "dadaf341602993a7854867a1df3004d0aa5d9b8c"
   },
   "outputs": [],
   "source": [
    "train_df.loc[train_df[\"label\"] == 1, \"comment\"].str.len().apply(np.log1p).hist(\n",
    "    label=\"sarcastic\", alpha=0.5\n",
    ")\n",
    "train_df.loc[train_df[\"label\"] == 0, \"comment\"].str.len().apply(np.log1p).hist(\n",
    "    label=\"normal\", alpha=0.5\n",
    ")\n",
    "plt.legend();"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "c2c613ee2052a2c0379682adf5c23d1f751f4c3b"
   },
   "outputs": [],
   "source": [
    "from wordcloud import STOPWORDS, WordCloud"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "ae7333d67f6a17673d2aa16aed3017e2fbef9b58"
   },
   "outputs": [],
   "source": [
    "wordcloud = WordCloud(\n",
    "    background_color=\"black\",\n",
    "    stopwords=STOPWORDS,\n",
    "    max_words=200,\n",
    "    max_font_size=100,\n",
    "    random_state=17,\n",
    "    width=800,\n",
    "    height=400,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "7a59f35dc359cb1b13a363b0f515b38a03c7b940"
   },
   "source": [
    "Word cloud are nice, but not very useful"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "7bc095d9b0c549d4a21478ed52845210c0ffbb57"
   },
   "outputs": [],
   "source": [
    "plt.figure(figsize=(16, 12))\n",
    "wordcloud.generate(str(train_df.loc[train_df[\"label\"] == 1, \"comment\"]))\n",
    "plt.imshow(wordcloud);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "f2423d8c8adf818c0f7c709665f741e1f885e47f"
   },
   "outputs": [],
   "source": [
    "plt.figure(figsize=(16, 12))\n",
    "wordcloud.generate(str(train_df.loc[train_df[\"label\"] == 0, \"comment\"]))\n",
    "plt.imshow(wordcloud);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "15c4140c01495b4d62a5b18d1906ba191e01505c"
   },
   "source": [
    "Let's analyze whether some subreddits are more \"sarcastic\" on average than others"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "6ba84720d54144e054b6963d78b48bf648e5c652"
   },
   "outputs": [],
   "source": [
    "sub_df = train_df.groupby(\"subreddit\")[\"label\"].agg([np.size, np.mean, np.sum])\n",
    "sub_df.sort_values(by=\"sum\", ascending=False).head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "575136be3a080cdd9a93ee7ee0c5fc9d9bf754e9"
   },
   "outputs": [],
   "source": [
    "sub_df[sub_df[\"size\"] > 1000].sort_values(by=\"mean\", ascending=False).head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "4e9cf503955b00599953ef8f87d0662155876519"
   },
   "source": [
    "The same for authors doesn't yield much insight. Except for the fact that somebody's comments were sampled - we can see the same amounts of sarcastic and non-sarcastic comments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "1cde1056ef0d3c62b98d70a299d83df29fcf6071"
   },
   "outputs": [],
   "source": [
    "sub_df = train_df.groupby(\"author\")[\"label\"].agg([np.size, np.mean, np.sum])\n",
    "sub_df[sub_df[\"size\"] > 300].sort_values(by=\"mean\", ascending=False).head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "46f4aa1c9524d525acec04ca0754f85a5190514b"
   },
   "outputs": [],
   "source": [
    "sub_df = (\n",
    "    train_df[train_df[\"score\"] >= 0]\n",
    "    .groupby(\"score\")[\"label\"]\n",
    "    .agg([np.size, np.mean, np.sum])\n",
    ")\n",
    "sub_df[sub_df[\"size\"] > 300].sort_values(by=\"mean\", ascending=False).head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "3e2f474c6219798c2fef2e4004ad4a6c1022f8ee"
   },
   "outputs": [],
   "source": [
    "sub_df = (\n",
    "    train_df[train_df[\"score\"] < 0]\n",
    "    .groupby(\"score\")[\"label\"]\n",
    "    .agg([np.size, np.mean, np.sum])\n",
    ")\n",
    "sub_df[sub_df[\"size\"] > 300].sort_values(by=\"mean\", ascending=False).head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "416321f19f5a27290bc5622e8b3384b7bbbd28c6"
   },
   "source": [
    "### Part 2. Training the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "3048a070a56b08eb4e5fe2c54b6d14905031e74a"
   },
   "outputs": [],
   "source": [
    "# build bigrams, put a limit on maximal number of features\n",
    "# and minimal word frequency\n",
    "tf_idf = TfidfVectorizer(ngram_range=(1, 2), max_features=50000, min_df=2)\n",
    "# multinomial logistic regression a.k.a softmax classifier\n",
    "logit = LogisticRegression(C=1, n_jobs=4, solver=\"lbfgs\", random_state=17, verbose=1)\n",
    "# sklearn's pipeline\n",
    "tfidf_logit_pipeline = Pipeline([(\"tf_idf\", tf_idf), (\"logit\", logit)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "8756bac7457218e4daf08ec276211f03971c17fb"
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "tfidf_logit_pipeline.fit(train_texts, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "d2e47f77f999c2fb5aee9ef1de1542bc93de4c98"
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "valid_pred = tfidf_logit_pipeline.predict(valid_texts)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "a8f93efc3db12910eaa6d7944feebb2418714203"
   },
   "outputs": [],
   "source": [
    "accuracy_score(y_valid, valid_pred)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "fefd0178f43ce832031653be70f0a0e47f62cf4c"
   },
   "source": [
    "### Part 3. Explaining the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "247a13fd3ae4d5c015c0ca0489a9a95d72ad7e9f"
   },
   "outputs": [],
   "source": [
    "def plot_confusion_matrix(\n",
    "    actual,\n",
    "    predicted,\n",
    "    classes,\n",
    "    normalize=False,\n",
    "    title=\"Confusion matrix\",\n",
    "    figsize=(7, 7),\n",
    "    cmap=plt.cm.Blues,\n",
    "    path_to_save_fig=None,\n",
    "):\n",
    "    \"\"\"\n",
    "    This function prints and plots the confusion matrix.\n",
    "    Normalization can be applied by setting `normalize=True`.\n",
    "    \"\"\"\n",
    "    import itertools\n",
    "\n",
    "    from sklearn.metrics import confusion_matrix\n",
    "\n",
    "    cm = confusion_matrix(actual, predicted).T\n",
    "    if normalize:\n",
    "        cm = cm.astype(\"float\") / cm.sum(axis=1)[:, np.newaxis]\n",
    "\n",
    "    plt.figure(figsize=figsize)\n",
    "    plt.imshow(cm, interpolation=\"nearest\", cmap=cmap)\n",
    "    plt.title(title)\n",
    "    plt.colorbar()\n",
    "    tick_marks = np.arange(len(classes))\n",
    "    plt.xticks(tick_marks, classes, rotation=90)\n",
    "    plt.yticks(tick_marks, classes)\n",
    "\n",
    "    fmt = \".2f\" if normalize else \"d\"\n",
    "    thresh = cm.max() / 2.0\n",
    "    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):\n",
    "        plt.text(\n",
    "            j,\n",
    "            i,\n",
    "            format(cm[i, j], fmt),\n",
    "            horizontalalignment=\"center\",\n",
    "            color=\"white\" if cm[i, j] > thresh else \"black\",\n",
    "        )\n",
    "\n",
    "    plt.tight_layout()\n",
    "    plt.ylabel(\"Predicted label\")\n",
    "    plt.xlabel(\"True label\")\n",
    "\n",
    "    if path_to_save_fig:\n",
    "        plt.savefig(path_to_save_fig, dpi=300, bbox_inches=\"tight\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "ff4a79b0368176a518fb0b84b45a508499e6183f"
   },
   "source": [
    "Confusion matrix is quite balanced."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "6df0c058a45b48b756e57e01a23bbc0974407195"
   },
   "outputs": [],
   "source": [
    "plot_confusion_matrix(\n",
    "    y_valid,\n",
    "    valid_pred,\n",
    "    tfidf_logit_pipeline.named_steps[\"logit\"].classes_,\n",
    "    figsize=(8, 8),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "6af3a7c93afef23ce9d215bf1daa2c91feb57d5d"
   },
   "source": [
    "Indeed, we can recognize some phrases indicative of sarcasm. Like \"yes sure\". "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "f62f3043b6e94fb6bbd5683a0e9662c572847fa6"
   },
   "outputs": [],
   "source": [
    "import eli5\n",
    "\n",
    "eli5.show_weights(\n",
    "    estimator=tfidf_logit_pipeline.named_steps[\"logit\"],\n",
    "    vec=tfidf_logit_pipeline.named_steps[\"tf_idf\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "be94f2065f86c65d5ee5590c9b2e5a541135732c"
   },
   "source": [
    "So sarcasm detection is easy.\n",
    "<img src=\"https://habrastorage.org/webt/1f/0d/ta/1f0dtavsd14ncf17gbsy1cvoga4.jpeg\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "5648f6ad7a14ef3a582909f7c0c72c4fc80204aa"
   },
   "source": [
    "### Part 4. Improving the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "aaefd2eb6f829f9eb9e9cd12c7903d3086182acc"
   },
   "outputs": [],
   "source": [
    "subreddits = train_df[\"subreddit\"]\n",
    "train_subreddits, valid_subreddits = train_test_split(subreddits, random_state=17)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "4a0dac5edc6b5c4078622637454baa820bc18a5f"
   },
   "source": [
    "We'll have separate Tf-Idf vectorizers for comments and for subreddits. It's possible to stick to a pipeline as well, but in that case it becomes a bit less straightforward. [Example](https://stackoverflow.com/questions/36731813/computing-separate-tfidf-scores-for-two-different-columns-using-sklearn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "88be690ae260d824fbd8df4a4d02e5abcce0d5a7"
   },
   "outputs": [],
   "source": [
    "tf_idf_texts = TfidfVectorizer(ngram_range=(1, 2), max_features=50000, min_df=2)\n",
    "tf_idf_subreddits = TfidfVectorizer(ngram_range=(1, 1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "453f8bc16c7726cbff936f3170442341eca3b45e"
   },
   "source": [
    "Do transformations separately for comments and subreddits. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "bfd35a513a3a090485df667e8ec773c57682aae7"
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "X_train_texts = tf_idf_texts.fit_transform(train_texts)\n",
    "X_valid_texts = tf_idf_texts.transform(valid_texts)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "bea0b7a09c57afefb77e5eabd2b8ed18bf019855"
   },
   "outputs": [],
   "source": [
    "X_train_texts.shape, X_valid_texts.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "b81a4ad703e2fb0f97f21a449af7e0d8b39a860c"
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "X_train_subreddits = tf_idf_subreddits.fit_transform(train_subreddits)\n",
    "X_valid_subreddits = tf_idf_subreddits.transform(valid_subreddits)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "d55aec9753a506f41310b4b717cdbb8694af8a8e"
   },
   "outputs": [],
   "source": [
    "X_train_subreddits.shape, X_valid_subreddits.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "74b044c4b73653181ddde0f32e65a14d4867319c"
   },
   "source": [
    "Then, stack all features together."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "bc458646e3c36792cf845f86cb9b6d6cc384b32c"
   },
   "outputs": [],
   "source": [
    "from scipy.sparse import hstack\n",
    "\n",
    "X_train = hstack([X_train_texts, X_train_subreddits])\n",
    "X_valid = hstack([X_valid_texts, X_valid_subreddits])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "d5d2505567c9303b2ce9f81c9bdc11fc799af91e"
   },
   "outputs": [],
   "source": [
    "X_train.shape, X_valid.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "a7f2040b76acbf2ff58e5dc68d7437ee6c3e9989"
   },
   "source": [
    "Train the same logistic regression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "77b08444a449def113803365001d1f844620d5aa"
   },
   "outputs": [],
   "source": [
    "logit.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "1a48a2e637db7e695766af5be6f5aa60596f6ad9"
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "valid_pred = logit.predict(X_valid)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "b50af0bc6ea1297538daaff183b7eecf8dfa7c2c"
   },
   "outputs": [],
   "source": [
    "accuracy_score(y_valid, valid_pred)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "5fe6b371e91831198dda10d8b198c9e1ebfe07aa"
   },
   "source": [
    "As we can see, accuracy slightly increased."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "7f0f47b98e49a185cd5cffe19fcbe28409bf00c0"
   },
   "source": [
    "## Links:\n",
    "  - Machine learning library [Scikit-learn](https://scikit-learn.org/stable/index.html) (a.k.a. sklearn)\n",
    "  - Kernels on [logistic regression](https://www.kaggle.com/kashnitsky/topic-4-linear-models-part-2-classification) and its applications to [text classification](https://www.kaggle.com/kashnitsky/topic-4-linear-models-part-4-more-of-logit), also a [Kernel](https://www.kaggle.com/kashnitsky/topic-6-feature-engineering-and-feature-selection) on feature engineering and feature selection\n",
    "  - [Kaggle Kernel](https://www.kaggle.com/abhishek/approaching-almost-any-nlp-problem-on-kaggle) \"Approaching (Almost) Any NLP Problem on Kaggle\"\n",
    "  - [ELI5](https://github.com/TeamHG-Memex/eli5) to explain model predictions"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
